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We describe in this paper a television system in which only a fraction of 
the picture elements is sampled. A movement detecting circuit examines the 
frame-to-frame difference signals and divides the picture into "moving" 
and "stationary" areas. In stationary areas the unsampled elements retain 
their value from the previous frame; in moving areas the unsampled 
elements take interpolated values from the current frame. When one-half 
of the elements is sampled, the resulting pictures are difficult to distinguish 
from the original, fully sampled, picture. When the fraction is only one 
quarter, the degradation is visible and disturbing. We mention several 
possible improvements. 

I. INTRODUCTION 

Television coders are usually designed to meet simultaneously the 
worst contingencies with respect to contrast, sharpness and movement. 
That is, a fast moving subject can be reproduced with the full spatial 
resolution afforded a stationary subject and with the full contrast 
resolution afforded a low detail subject. By contrast resolution we 
mean the accuracy with which the coded signal represents the ampli- 
tude of the input signal for a particular picture element. 

We want to reduce the channel capacity required for transmitting 
television signals by coding the signal so that full spatial resolution is 
only available in stationary areas of the picture and full temporal 
resolution is only available in moving areas of the picture. 

Exchange of spatial and contrast resolution has been appreciated 
and demonstrated for some time. 1 One of the first examples of a coder 
that exchanged spatial and contrast resolution was demonstrated by 
E. R. Kretzmer 2 ; more quantizing levels were assigned to low fre- 
quency signal components and fewer levels were assigned to high fre- 
quency signal components. 
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A. J. Seyler 1 has pointed out that this principle can be extended to 
exchanging temporal and spatial resolution because we can tolerate 
blurring of moving objects. This is evident because the television 
camera integrates the video signal on the target for l/30th of a second. 
Thus, if the object being imaged on the target moves during this 
period, the image will be blurred and resolution will be destroyed in 
the direction in which the object moves. The amount of blurring can 
be quite large. For example, if the object moves at a speed which 
would require two seconds to cross the television screen, then in one 
frame-time a picture element would average light from eight different 
picture elements on the object (at broadcast television rates). Conse- 
quently, if the object is moving in the horizontal direction, the hori- 
zontal resolution is reduced approximately eightfold; there is even 
more reduction if there is camera lag. 

It is not clear whether people tolerate such blurring because of the 
psycho-physical integration of moving objects 3 or because of long 
exposure to television, but the fact that we do tolerate this loss in 
resolution means that we should be able to reduce the spatial sampling 
rate in moving areas of a television picture without degrading picture 
quality. We refer to sampling at a reduced rate along a scan line (or 
along a vertical line) as spatial subsampling. The unsampled picture 
elements are replaced by interpolating between the sampled elements. 

If an object moves slowly enough, temporal resolution can be 
greatly reduced without impairing picture quality. A simple method 
of achieving this is by frame repeating, in which one new frame of 
information out of n is transmitted and for the remaining (n — 1) 
frames this one frame is just repeated. 4 As would be expected if the 
subject moves fast enough, the perceived motion is very jerky. In 
alternative, and more pleasing, methods for reducing temporal resolu- 
tion l/n th of the points are replenished in each frame in a dot inter- 
laced fashion. 4-6 The type of picture degradation for these schemes 
differs from frame repeating in that when a subject moves, the edges 
become blurred and exhibit a checkerboard texture. We refer to sampl- 
ing a given picture element at the reduced frame rate as temporal 
subsampling. 

By temporally subsampling in stationary areas and spatially sub- 
sampling in moving areas we should be able to reduce the channel 
capacity required to transmit a satisfactory television signal. We 
have made a digital television system in which the picture is divided 
into stationary and moving areas which are appropriately subsampled. 
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This apparatus, the resulting pictures and the saving of required chan- 
nel capacity are described below. 

II. APPARATUS 

An outline schematic is shown in Figure 1. The scan format is 30 
frames per second with a 2:1 interlace. Each frame has 271 lines, each 
comprising 248 picture elements. The television signal source is a 
vidicon camera; the video signal is digitized to 8-bit accuracy in the 
Pulse Code Modulation (PCM) encoder and stored in the digital 
ultrasonic wire delay lines of the frame memory. The output of the 
frame memory is subtracted from the PCM encoder output to yield 
a frame difference signal for each picture element. These signals are 
examined in the movement detector whose output determines whether 
the current element (entering the area) is to be regarded as being in a 
moving or stationary area. The output changes from a "stationary 
state" (s = 0) to a "moving state" (s = 1) when more than n frame 
difference signals out of a sequence of m exceed a threshold. The output 
will return to the stationary state when all m samples exhibit insignif- 
icant frame difference signals. 

Thus the movement detector exhibits hysteresis reducing the fre- 
quency of mode changes so that the picture is segmented into rela- 
tively few contiguous moving and stationary areas. 

In areas judged to be stationary, alternate elements in each line are 
sampled according to the pattern of Fig. 2a where the elements tabu- 
lated A are sampled in one frame and the elements tabulated B are 
sampled in the next. The unsampled elements retain the value of 



TV 
CAMERA 




DIGITIZER 






















































MOVEMENT 
DETECTOR 




























RESOLUTION 

REDUCTION 




















1 

TEM 










SPATIAL 
'ORAL 




TO DISPLAY 




? 







Fig. 1 — System for resolution exchange. 
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the previous frame. In the moving areas the same sampling pattern 
could be used, but we found that the pattern shown in Fig. 2b gave 
a better picture; the unsampled elements (marked 0) are given the 
mean value of the two neighboring elements. When the sampling rate 
along the line is reduced, we rely on the blurring caused by the motion 
to bandlimit the video signal. A bandlimiting filter should also be 
used on the output signal. However, with the sampling pattern of Fig. 
2b linear interpolation between the sampled elements gave satisfactory 
pictures although aliasing patterns were just visible when the subject 
moved at a certain speed. Further filtering removed these patterns but 
caused blurring which we judged to be more objectionable. 

Strickly speaking, frame-to-frame comb filters should be used in 
"stationary" areas because of the reduced temporal sampling rate but 
such filters would involve frame delays and were felt to be impractical. 
The aforementioned checkerboard texture which can be seen on moving 
edges in temporally subsampled areas is probably due to the resulting 
aliasing. The experiments carried out with this apparatus and the 
resulting pictures are described in the next two sections. 

In a second series of experiments only every fourth element was 
sampled and the sampling patterns for the stationary and moving 
areas are shown in Figs. 2c and 2d. Only fast movement can now 
sufficiently bandlimit the video signal and so there are switched filters 
before and after the subsampling. 

Some experiments were also conducted with a 4-bit frame difference 
quantized signal. The characteristics of the quantizer are: 

INPUT LEVELS 

=fc 0,1 2-5 6-11 12-17 18-27 28-37 38-52 54-255 

OUTPUT LEVELS 

±028 14 22 32 44 60 

III. PROCEDURE 

Each scene viewed by the television camera was a head and shoulders 
view of a person talking; the movement varied from 'gentle' (only 
lip movement) to 'very active' (arm movement and the person getting 
up and walking away). 

We viewed the resulting pictures on a television monitor. The 
unblanked raster size was 5f inches horizontal by 5 inches vertical. 
The viewing distance was approximately 40 inches and the ambient 
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AOOOAOOOA 
AOOOAOOOA 
ABABABABA AOAOAOAOA ACBDACBDA OOAOOOAOO 
ABABABABA AOAOAOAOA ACBDACBDA OOAOOOAOO 
BABABABAB OAOAOAOAO BDACBDACB AOOOAOOOA 
BABABABAB OAOAOAOAO BDACBDACB AOOOAOOOA 

(3) TEMPORAL 2:1 (D ) SPATIAL 2 : I (O TEMPORAL 4 : 1 (d) SPATIAL 4 : I 

Fig. 2— Sampling patterns. ABCD refer to consecutive frames. denotes aver- 
age of neighboring, sampled elements. 

illumination was about the average office illumination (70 foot- 
candles). 

For each scene we first viewed the picture resulting from the 8-bit 
PCM signal and then the processed pictures in the following order: 
continuously temporally subsampled; continuously spatially sub- 
sampled ; temporally subsampled in stationary areas and spatially sub- 
sampled in moving areas. As a check we sometimes 'flagged' the area 
judged moving by the movement detector. For most work the detector 
was adjusted to switch to spatial subsampling when four out of eight 
picture elements (pels) examined exhibited frame-to-frame difference 
signals exceeding a threshold of four (out of 255 levels) ; for a return 
to the temporal subsampling mode, none of the eight examined pels 
exhibited a frame difference exceeding this threshold. 

For recording still photographs a model head was swung as a 
pendulum bob so that the object speed would be known. One television 
frame (selected at the bottom of the swing) was stored, displayed 
continuously and photographed with an exposure time of ^ second. 

IV. RESULTS 

The pictures which were temporally subsampled (by a factor of 
2) all over, were not only excellent for stationary scenes but also 
satisfactory for scenes with an object (e.g., a head) speed up to one pel 
per frame interval for an object of normal contrast. At higher speeds 
the checkerboard pattern already referred to and described by others 4 
became visible; at object speeds of two pels per frame interval and 
above, this pattern was annoying for most scenes. 

When the pictures were spatially subsampled by a factor of two 
all over, the loss of resolution was just visible for most scenes; the 
loss was more obvious when viewing graphical material. 

When the sampling frequency was halved temporally in stationary 
areas and spatially in moving areas, the resulting pictures were 
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usually indistinguishable from the fully sampled picture. The scenes 
where the difference was visible contained a contrasty edge moving 
slowly (one pel per frame interval or less) ; in this case the continuity 
of the edge was disturbed. This effect is caused by erratic movement 
detection so that parts of the edge are sometimes spatially subsampled 
and sometimes temporally subsampled. More sophisticated movement 
detection should reduce this edge breakup by changing modes only 
when the blurring due to temporal subsampling is comparable with 
the blurring due to spatial subsampling. 

Some still photographs are shown in Fig. 3. In Fig. 3a the whole 
scene is stationary and the picture is fully sampled. In Fig. 3b the 
head is moving at about three pels per frame interval (estimated 
accuracy +0, — ^ pels per frame interval) ; the picture is still fully 
sampled and the blurring introduced by the movement can be seen 
by comparison with Fig. 3a. Fig. 3c shows an equivalent scene with 
a 2:1 exchange of resolution. There is very little difference between 
this picture and the fully sampled one (Fig. 3b). Fig. 3d shows the 
bright flags indicating that part of the scene judged to be moving 
during one frame. 

The results using 4 : 1 subsampling were less encouraging. With over- 
all 4:1 temporal subsampling (according to the pattern of Fig. 2c) 
the blurring and checkerboard patterns were objectionable even in 
such slowly moving areas as someone's mouth when he was talking. 
In the pictures resulting from 4:1 spatial subsampling (according to 
the pattern in Fig. 2d), the blurring was again objectionable, espe- 
cially in stationary areas. 

In the pictures resulting from exchanging resolution according to 
the movement detector, the breakup of edges was much more apparent 
and the movement detection was erratic ; even some parts of the back- 
ground or other stationary areas were judged moving. These errors 
arise because the digital filter in the loop gives rise to frame difference 
signals which may be interpreted as movement by the movement 
detector. This effect keeps the digital filter switched in even in sta- 
tionary areas. Again, more sophisticated movement detection should 
reduce this effect. 

The pictures coded as a frame difference quantized signal (instead 
of 8-bit PCM) were usually similar to those coded as 8-bit PCM. The 
only difference was with scenes containing fast movement (speeds of 
four pels per frame interval or more) of contrasty objects; these 
objects appeared somewhat noisier than in those pictures coded as 
eight-bit PCM. 
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Fig. 3 — (a) Fully sampled picture showing stationary head, (b) Fully sampled 
picture showing moving head, (c) Subsampled (2:1) picture using exchanged 
resolution, (d) Flags showing area deemed moving. Glossy prints of this figure 
may be obtained by writing to the authors. 



V. DISCUSSION 



5.1 Comparison of 2:1 and 4:1 Subsampling 

Although the simple techniques described above yielded a satis- 
factory picture when the overall sampling rate was halved, the 
equivalent techniques did not give a satisfactory picture when the 
sampling rate was quartered. The unpleasant effect of the latter tech- 
nique appears mostly at the boundaries between areas treated differ- 
ently. However, there are many possibilities using a 4:1 ratio reduc- 
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tion that remain to be tried: (i) Use an intermediate mode in which 
the second of the three unsampled elements is assigned the value from 
the previous frame, and the first and third are assigned the average 
value of their neighbors. This is a simultaneous reduction by a factor 
of two in both spatial and temporal resolution and should result in a 
less visible boundary between areas encoded in the different modes. 
(ii) Modify the spatial subsampling patterns so that vertical and hori- 
zontal resolution is reduced equally. At present the horizontal loss is 
more than the vertical loss. (Hi) Use an improved movement detector 
which will indicate the local speed of movement so that modes are not 
switched until the blurring is similar in each mode. 

5.2 Combination oj Subsampling with Other Coding Techniques 

Much digital television employs coders which give either element 
differential PCM in which the interelement difference along a line is 
quantized 7 or frame differential PCM in which the difference between 
consecutive frames is quantized. 8 - 9 

If the information describing when to switch modes can be trans- 
mitted during the blanking interval, then we could use a 4-bit frame 
difference PCM signal combined with 2:1 resolution exchange for 
encoding television signals. In our scan format, which resembles that 
envisaged for Picturephone® service, this would require a channel 
capacity of four megabits per second. We have measured the num- 
ber of mode changes per line for a number of different scenes and 
found that it was rarely more than six. If we used a 4-bit word to 
describe each mode change, then the time required for transmitting 
the information is rarely more than six microseconds. This is con- 
siderably less than the line blanking interval of 13 microseconds for 
our present scan format. Such a system, although requiring a frame 
memory, is essentially bufferless as a 6-word shift register suffices to 
buffer the above mode switching information. Applying spatial sub- 
sampling techniques to element differential PCM is not straightfor- 
ward because when we halve the horizontal sampling frequency we 
no longer have such near neighbors for predicting the value of an 
upcoming element and therefore quantizing noise increases. Two pos- 
sible alternatives are: 

(i) Use vertically adjacent elements for predicting the upcoming 
element. Because of interlace such elements are temporally 
displaced by one field time from the current line and may be 
unsatisfactory for predicting values in the current line in moving 
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areas. This problem would not arise with a sequential, non- 
interlaced, scan format. 
(it) Reduce only vertical resolution by sampling alternate lines in 
each field. Such a process may give rise to an unpleasant rolling 
pattern in the picture. 
We describe experiments in which resolution exchange is applied to 
an element differential PCM signal in a forthcoming publication. 

A third form of coding which is being considered is conditional pic- 
ture element replenishment 10 in which only the signal corresponding to 
the moving part of the picture is transmitted. Such a system requires 
a large buffer store which can become overloaded when the pictures 
contain large areas of rapid movement. The use of spatial subsampling, 
either 2:1 or 4: 1, in such moving areas is one way of reducing or pre- 
venting buffer overflow. 9 - 11 If we are aiming for a bit rate of one-bit 
per picture element per frame we can simulate the worst possible case 
by spatially subsampling by 4:1 the whole picture encoded as 4-bit 
frame differential PCM. The resulting picture looks much better than 
the picture tearup which occurs when a buffer overloads 10 and, hence, 
this mode can be used whenever the buffer approaches overflow. 
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